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Multi-user canonicalization 


D uring application development, it is sometimes 
useful to guarantee that your code does not create 
two objects that are logically equivalent. Instead, 
you would I ike the attempt to create a new object to actu¬ 
ally return an existing, equivalent object, if it should exist. 

Otherwise, the new object can becreated and registered 
in such a way so that subsequent attempts to create an 
equivalent object will return this one. This technique, 
cal led canonicalization, makes your code more efficientby 
eliminating redundant objects and allowing you to take 
advantageof object identity. For example, ifyou know that 
there wi 11 on ly be a si ngle object to represent some logi cal 
entity, you can useidentitycomparisons(==or~~) when 
scanningforthepresenceoftheobjectinsomecol lection. 
Identity comparisons are usually more efficient to use 
because they are typically in-lined bythecompiler, and do 
notrequirefetchingtheobjectsto return an answer. 

M ost Smal Ital kers are al ready famiii ar with the concept 
of canon ical ization with the use of symbols. By defi nition, 
symbols are guaranteed to be unique, so that any symbol 
with the same sequence of characters wi II have the same 
identity. This means that no matter where or how a sym¬ 
bol is created, an identity comparison of two equivalent 
symbols always returns true. In fact, the implementation 
of = for class Symbol is the same as ==. 

The uniqueness of symbols allows them to be used in 
fast identity-based collections, such as a key in an identi¬ 
ty dictionary, while preserving the semantics of equality 
look-up. This is one reason why method selectors are 
symbols rather than stri ngs, si nee they are used as keys i n 
a method dictionary. 

A common usage for canonicalization is to imple¬ 
ment a smart cache of objects whose state is derived 
from an external system. For example, if objects are 
being materialized from a relational database, then a 
cache typically maps a relational primary key value to its 
corresponding Smalltalk object. If some part of the 
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application needs an object with a particular key value, 
the cache is consulted first. If there is already an entry in 
the cache for that particular key, then the application 
can avoid having to execute time-consuming code to 
communicate with the relational database and perform 
the relational-to-object mapping, since it has already 
been done before (of course, if the relational data has 
been modified since the initial caching occurred, then 
the cache must somehow be updated or invalidated, but 
that is a different problem). 

Building your own canonicalization mechanism is fair¬ 
ly straightforward in asingl e-user Smalltalksystem. Atyp¬ 
ical implementation is to override the instance creation 
method to check for the presence of an existing, equiva¬ 
lent object before creating a new one. A common imple¬ 
mentation is to maintain a dictionary in a class variable, 
where the keys of the dictionary are the logical values 
upon which equivalence is determined, and the values of 
the di ctionary are i nstances of the class that have al ready 
been created. I suggest using a class variable, rather than 
a class instance vari able, so that creati ng i nstances of sub¬ 
classes consults the same dictionary. Another advantage 
of this implementation is that it is very easy to get all 
instances of a class and its subclasses. 

To illustrate this technique, here isthe implementation 
of an instance creation method for class Employee. In 
addition to having instance variables for name and social 
security number. Employee has a class variable called 
"CanonDictionary" that is initialized to a dictionary. In this 
model, social security number is the primary key upon 
which equivalence is based, i.e. we never want object 
memory to contain more than one instance of Employee 
with the same social security number. Since we always 
want an Employee to have a social security number, we 
override the "new" method to raise an error, and require 
instance creation to occur with the "name:ssn:" method 
listed here: 

classmethod: Employee 

name: aName ssn: aSSN 

"Return an instance with the given name and ssn. If one 
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GETTING REAL 


does not exist in the canonicalization dictionary, create 
a new one; otherwise, return the existing one." 

^ CanonDictionary at: aSSN ifAbsent: [ | emp | 

emp : = self basicNew name: aName; ssn: aSSN. 
CanonDictionary at: aSSN put: emp. 


This technique works fine in single-user Smalltalk sys¬ 
tems, since only one user is creating objects in this 
image. Butin multi-user Smalltalk, there may be concur¬ 
rent users who are creating objects in a shared image. 

This opens the door to the possibility that users will 
experience concurrency conflicts on the canonical ization 
dictionary. In addition, since each user operates with 
thei r own transactional ly consistent view of objects, there 
may be more than one user who thinks he or she is creat¬ 
ing the first instance of an Employee with a particular 
social security number. This is because neither user will 
see the other’s modifications until hisor her transaction is 
committed. At the very least, one of the users could expe¬ 
rience a concurrency conflict, but it could be worse if 
both users were allowed to create logically equivalent 
instances of Employee and the application code depended 
upon their uniqueness. 

Fortunately, by subclassing an existing specialized 
multi-user class, this situation can be handled correctly. In 
GemStone Smalltalk, the class RcHashDictionary provides 
concurrency semantics that are close to what is needed 
(see my column in the March-April 1995 issue of the 
Smalltalk Report for a description of reduced conflict 
classes). This multi-user dictionary allows concurrent 
updaters and removers from the dictionary to perform 
their operations without conflict, as long as they are using 
different keys. For example, two concurrent users who 
are performing at:put: operations with non-equivalent 
keys will not experience concurrency conflicts. But in our 
example, concurrent users might try to create instances 
with the same social security number, so they would ex¬ 
perience conflict. What is needed is the ability to recog¬ 
nize these conflicts, choose one of the instances to be the 
canonical Employee with that social security number, and 
to replace all references to the noncanon i cal Employee with 
references to the canonical Employee (allowing the non- 
canonical Employee to be eventually garbage collected). 

To solve this problem, I created a subclass of RcHash- 
Dictionary, called RcCanonicalDictionary. This class only 
needs to override one method to provide the desired 
behavior; however, to implementthismethod requires an 
understanding of how reduced conflict behavior is 
achieved. When a user attempts to commit a transaction, 
the underlying system detects if there are physical con¬ 
flicts on objects, for example, checking if this transaction 
wrote an object that another concurrent transaction had 
already written and committed. For most objects, a phys¬ 
ical conflict means the transaction cannot succeed. 
However, for special reduced conflict objects, they are 
given a second chance to determine if the physical con¬ 
flict can logically be resolved. 


This involves selectively updating the view of these 
objects so that the committed modifications of other 
users are visible, and then replaying the modifications of 
thecurrenttransaction on the reduced conflict objects. If 
the modifications can be replayed without failing, then 
the transaction isallowed to commit successfully. 

For RcHashDictionaries, the method that replays up¬ 
dates to the dictionary is _replayAt:put:oldValue:. This 
method is similar to at:put:, except that the third argu¬ 
ment is the original value at the given key before the 
update occurred (this argument is nil if the entry was 
added for the first time). 

This allows the replay method to check if the value 
before the update is the same during replay as it was 
when the operation was originally invoked during the 
transaction. When theoperation is replayed, ifthecurrent 
value is not the same as the old value, then we know some 
concurrent user has updated the dictionary at this key 
and we should fail the attempt to commit the transaction. 

For our new RcCanonicalDictionary, rather than fail the 
transaction when another user commits a new entry at 
the same key, we would like to forget the value we were 
going to insert, and use the value that another user 
already inserted. This involves 9/vizzlinga\\ references to 
the value we were about to insert to the new value insert¬ 
ed by a concurrent user. Fortunately, this is not very hard 
to do, si nee we can get a collection of all objects that were 
written during the transaction, and scan them to find ref¬ 
erences to our value. This avoids having to scan all of 
object memory to find references, which is prohibitive for 
a large scale number of objects. 

One thing that must be accounted for when swizzling 
object references is to correctly update collections where 
the position of an object in the collection is dependent 
upon the identity of the object. 

In GemStone Smalltalk, Bag and its subclasses use the 
identity of its elements to determine their positions in the 
internal implementation structures. Consequently, rather 
than overwriting the reference to the old value in these 
collections, the swizzling method first removes the old 
value and then adds the new value to the collection. 
Below are the methods to replay the insertion into an 
RcCanonicalDictionary when a physical conflict is detected, 
and the methods to swizzle references in general objects 
and for Bags. 

method: RcCanonicalDictionary 
_replayAt: aKey put: aValue oldValue: oldValue 
"Stores the key/value pair in the dictionary. If there is 
already a value for the given key, then this method 
swizzles references to refer to the existing value." 

| existingVal | 

"see if there is now an existing entry (added by a 
concurrent user)" 

existingVal : = self at: aKey otherwise: nil. 

"if there is no existing entry, update the dictionary; 
otherwise swizzle" 
existingVal isNil 
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ifTrue: [ self at: aKey put: aValue ] 
ifFalse: [ 

" for each object written during this transaction, 
swizzle references" 

(System _hiddenSetAsArray: 9) do: [ :obj | 
obj _swizzleReferencesFrom: aValue to: 
existingVal 


" return true to indicate that the transaction can proceed " 
"true 

% 

method: Object 

_swizzleFteferencesFrom: objl to: obj2 
"Scan the named instance variables and indexable portion 
of the receiver, looking for references to objl. For any 
that are found, replace the reference with obj2." 

"first scan named inst vars" 

1 to: self class instSize do: [ :j | 
objl — (self instVarAt: j) 
ifTrue: [ self instVarAt: j put: obj2 ] 

]■ 

" scan indexable portion if necessary " 
self class isl ndexable 
ifTrue: [ 

1 to: self _basicSize do: [ :j | 


objl — (self _at: j) ifTrue: [ self _at: j put: obj2 ] 


% 

method: Bag 

_swizzleFteferencesFrom: objl to: obj2 
"If objl is contained in the receiver, remove all 
occurrences of it, and add the same number of 
occurrences for obj2." 

" invoke superclass method for named instance variables 
super _swizzleFteferencesFrom: objl to: obj2. 

(self includes: objl) 
ifTrue: [ 

(self occurrencesOf: objl) timesFtepeat: [ 
self remove: objl. 
self add: obj2. 


% 

Canonicalization of objects is a useful technique with 
many applications. In a multi-user environment, canoni¬ 
calization mechanisms must take into account concur¬ 
rent users creati ng equivalent objects. 

This column has demonstrated one approach for solv¬ 
ing this problem using the power and extensibility of 
multi-user Smalltalk, ffl 
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